May 22, 2018

Tēnā koutou

Overview

Overview

  • What is Data Science?
    • Common Topics
    • Tools I use
  • Intro to Machine Learning
    • Types of learning
    • Practical ML
  • Getting started with Python
  • Q & A

What is Data Science?

Data Science is:

From wikipedia:

"Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms"

Data science can cover any or all of the following:

  • Data visualisation
  • Statistical modelling
  • Machine learning
  • Automation of data-related processes
  • Big data/Distributed computing

Multidisciplinary

Technical

Tools for Data Science

Languages

  • The most popular programming languages for data science are Python and R.
  • Some flavour of SQL is also often necessary
  • For big data, Scala is popular (but Python/R can do big data too)
  • Learn git! Github is also awesome

If you want a recommendation, I'd say Python (but this is a matter of opinion).

In Python

In Python, the most common tools are these:

  • Anaconda is the python distribution I use
  • Jupyter Notebooks/Jupyter Lab is the IDE I use
  • For data manipulation, I use pandas
  • For machine learning, I use scikit-learn
  • For deep learning, I use tensorflow and keras
  • For big data, I use pyspark or increasingly dask
  • For data visualisation, I use matplotlib, seaborn, bokeh or plotly

Also noteworthy: scrapy for web scraping, and spacy for natural language processing

Machine Learning

What is machine learning?

From wikipedia (edited):

Machine learning is a sub-field of computer science and statistics which aims to give computers the ability to progressively improve performance on a specific task with data, without being explicitly programmed.

When it works, the idea is:

  • Collect a lot of data about a problem
  • Point an pattern finding algorithm at the data
  • …?
  • Profit

How does a machine learn?

Supervised learning

How does a machine learn?

Unsupervised learning

How does a machine learn?

Reinforcement learning

Modern Machine Learning

Feature engineering

Deep Learning